{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "0",
   "metadata": {},
   "source": [
    "# WebSurferAgent with Firecrawl Integration\n",
    "\n",
    "This notebook demonstrates how to use the WebSurferAgent with the Firecrawl tool for web scraping and crawling."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1",
   "metadata": {},
   "source": [
    "## Setup\n",
    "\n",
    "First, import the necessary modules and set up the WebSurferAgent with Firecrawl."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "2",
   "metadata": {},
   "outputs": [],
   "source": [
    "import os\n",
    "\n",
    "from autogen.agents.experimental.websurfer import WebSurferAgent\n",
    "\n",
    "# Set up your API keys\n",
    "FIRECRAWL_API_KEY = os.getenv(\"FIRECRAWL_API_KEY\", \"your_firecrawl_api_key_here\")\n",
    "OPENAI_API_KEY = os.getenv(\"OPENAI_API_KEY\", \"your_openai_api_key_here\")\n",
    "\n",
    "# LLM configuration\n",
    "llm_config = {\n",
    "    \"model\": \"gpt-4\",\n",
    "    \"api_key\": OPENAI_API_KEY,\n",
    "    \"temperature\": 0.1,\n",
    "}\n",
    "\n",
    "print(\"✓ Setup complete\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3",
   "metadata": {},
   "source": [
    "## Creating WebSurferAgent with Firecrawl\n",
    "\n",
    "Create a WebSurferAgent that uses Firecrawl as its web tool."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "4",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Create WebSurferAgent with Firecrawl\n",
    "websurfer_agent = WebSurferAgent(\n",
    "    name=\"WebSurfer\",\n",
    "    llm_config=llm_config,\n",
    "    web_tool=\"firecrawl\",\n",
    "    web_tool_kwargs={\n",
    "        \"firecrawl_api_key\": FIRECRAWL_API_KEY,\n",
    "        # Optional: specify custom Firecrawl API URL for self-hosted instances\n",
    "        # \"firecrawl_api_url\": \"https://your-firecrawl-instance.com\",\n",
    "    },\n",
    "    system_message=\"You are a helpful web researcher. Use Firecrawl to scrape, crawl, search, and research websites to answer user questions.\",\n",
    ")\n",
    "\n",
    "print(f\"✓ WebSurferAgent created with {type(websurfer_agent.tool).__name__}\")\n",
    "print(\n",
    "    f\"✓ Available tool methods: {[method for method in dir(websurfer_agent.tool) if not method.startswith('_') and callable(getattr(websurfer_agent.tool, method))]}\"\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5",
   "metadata": {},
   "source": [
    "## Available Firecrawl Methods\n",
    "\n",
    "The WebSurferAgent with Firecrawl tool provides access to all Firecrawl capabilities:\n",
    "\n",
    "1. **Scrape**: Extract content from a single URL\n",
    "2. **Crawl**: Recursively crawl a website starting from a URL\n",
    "3. **Map**: Discover URLs from a website\n",
    "4. **Search**: Search the web for content\n",
    "5. **Deep Research**: Perform comprehensive research on a topic with analysis"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6",
   "metadata": {},
   "source": [
    "## Example Usage\n",
    "\n",
    "Here are some example tasks you can perform with the WebSurferAgent using Firecrawl:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "7",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Example 1: Simple web scraping\n",
    "scrape_message = \"Scrape the homepage of https://example.com and summarize the main content.\"\n",
    "\n",
    "# Example 2: Website crawling\n",
    "crawl_message = \"Crawl https://example.com and find all the pages related to documentation.\"\n",
    "\n",
    "# Example 3: Website mapping\n",
    "map_message = \"Map the structure of https://example.com and list all the main sections.\"\n",
    "\n",
    "# Example 4: Web search\n",
    "search_message = \"Search for recent articles about artificial intelligence and summarize the top 3 results.\"\n",
    "\n",
    "# Example 5: Deep research\n",
    "research_message = \"Perform deep research on 'sustainable energy trends 2024' and provide a comprehensive analysis.\"\n",
    "\n",
    "print(\"Example messages prepared. Run them individually with the websurfer_agent.\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8",
   "metadata": {},
   "source": [
    "## Running Tasks\n",
    "\n",
    "To run any of the above tasks, you would use the agent like this:\n",
    "\n",
    "```python\n",
    "# Create a user proxy agent to interact with the websurfer\n",
    "from autogen import UserProxyAgent\n",
    "\n",
    "user_proxy = UserProxyAgent(\n",
    "    name=\"User\",\n",
    "    human_input_mode=\"NEVER\",\n",
    "    code_execution_config=False,\n",
    ")\n",
    "\n",
    "# Start a conversation\n",
    "user_proxy.initiate_chat(\n",
    "    websurfer_agent,\n",
    "    message=scrape_message,\n",
    "    max_turns=1,\n",
    ")\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9",
   "metadata": {},
   "source": [
    "## Key Features\n",
    "\n",
    "The WebSurferAgent with Firecrawl provides:\n",
    "\n",
    "- **Multiple web interaction methods**: scrape, crawl, map, search, and deep research\n",
    "- **Self-hosted support**: Connect to your own Firecrawl instance via `firecrawl_api_url`\n",
    "- **Flexible configuration**: Pass any Firecrawl parameters through `web_tool_kwargs`\n",
    "- **Consistent API**: Same interface as other WebSurferAgent tools (Tavily, DuckDuckGo, etc.)\n",
    "- **Rich content extraction**: Supports multiple output formats (markdown, HTML)\n",
    "- **Advanced filtering**: Include/exclude patterns, custom headers, timeouts\n",
    "\n",
    "This makes the WebSurferAgent a powerful tool for web research and content extraction tasks."
   ]
  }
 ],
 "metadata": {
  "front_matter": {
   "description": "This notebook demonstrates how to use the WebSurferAgent with the Firecrawl tool for web scraping and crawling.",
   "tags": [
    "firecrawl",
    "tool",
    "scraping"
   ]
  },
  "language_info": {
   "name": "python"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
